Testing APSyn against Vector Cosine on Similarity Estimation

نویسندگان

  • Enrico Santus
  • Emmanuele Chersoni
  • Alessandro Lenci
  • Chu-Ren Huang
  • Philippe Blache
چکیده

In Distributional Semantic Models (DSMs), Vector Cosine is widely used to estimate similarity between word vectors, although this measure was noticed to suffer from several shortcomings. The recent literature has proposed other methods which attempt to mitigate such biases. In this paper, we intend to investigate APSyn, a measure that computes the extent of the intersection between the most associated contexts of two target words, weighting it by context relevance. We evaluated this metric in a similarity estimation task on several popular test sets, and our results show that APSyn is in fact highly competitive, even with respect to the results reported in the literature for word embeddings. On top of it, APSyn addresses some of the weaknesses of Vector Cosine, performing well also on genuine similarity estimation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets

In this paper, we claim that Vector Cosine – which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the s...

متن کامل

Unsupervised Measure of Word Similarity: How to Outperform Co-Occurrence and Vector Cosine in VSMs

In this paper, we claim that vector cosine – which is generally considered among the most efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by an unsupervised measure that calculates the extent of the intersection among the most mutually dependent contexts of the target words. To prove it, we describe and evaluate APSyn, a variant of th...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Cosine Similarity Measure of Interval Valued Neutrosophic Sets

In this paper, we define a new cosine similarity between two interval valued neutrosophic sets based on Bhattacharya’s distance [19]. The notions of interval valued neutrosophic sets (IVNS, for short) will be used as vector representations in 3D-vector space. Based on the comparative analysis of the existing similarity measures for IVNS, we find that our proposed similarity measure is better an...

متن کامل

Improved cosine similarity measures of simplified intuitionistic sets for medicine diagnoses

Similarity measures are an important tool in pattern recognition and medical diagnosis. To overcome some disadvantages of existing cosine similarity measures for simplified neutrosophic sets (SNSs) in vector space, this paper proposes improved cosine similarity measures for SNSs based on the cosine function, including single valued neutrosophic cosine similarity measures and interval neutrosoph...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1608.07738  شماره 

صفحات  -

تاریخ انتشار 2016